AITopics | chinese character

Collaborating Authors

chinese character

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Glyce: Glyph-vectors for Chinese Character Representations

Yuxian Meng, Wei Wu, Fei Wang, Xiaoya Li, Ping Nie, Fan Yin, Muyu Li, Qinghong Han, Xiaofei Sun, Jiwei Li

Neural Information Processing SystemsFeb-12-2026, 01:04:26 GMT

Neural Information Processing Systems http://nips.cc/

arxiv preprint arxiv, classification task, representation, (13 more...)

Neural Information Processing Systems

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
North America > Canada (0.04)
Europe > Portugal > Lisbon > Lisbon (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Turing Test 2.0: The General Intelligence Threshold

Mappouras, Georgios

arXiv.org Artificial IntelligenceDec-5-2025

With the rise of artificial intelligence (A.I.) and large language models like ChatGPT, a new race for achieving artificial general intelligence (A.G.I) has started. While many speculate how and when A.I. will achieve A.G.I., there is no clear agreement on how A.G.I. can be detected in A.I. models, even when popular tools like the Turing test (and its modern variations) are used to measure their intelligence. In this work, we discuss why traditional methods like the Turing test do not suffice for measuring or detecting A.G.I. and provide a new, practical method that can be used to decide if a system (computer or any other) has reached or surpassed A.G.I. To achieve this, we make two new contributions. First, we present a clear definition for general intelligence (G.I.) and set a G.I. Threshold (G.I.T.) that can be used to distinguish between systems that achieve A.G.I. and systems that do not. Second, we present a new framework on how to construct tests that can detect if a system has achieved G.I. in a simple, comprehensive, and clear-cut fail/pass way. We call this novel framework the Turing test 2.0. We then demonstrate real-life examples of applying tests that follow our Turing test 2.0 framework on modern A.I. models.

information, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2505.1955

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Industry:

Education (1.00)
Leisure & Entertainment (0.93)
Health & Medicine > Therapeutic Area > Neurology (0.92)
Media (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Turing's Test (1.00)
(2 more...)

Add feedback

Bridging Vision, Language, and Mathematics: Pictographic Character Reconstruction with Bézier Curves

Wan, Zihao, Xu, Pau Tong Lin, Luo, Fuwen, Wang, Ziyue, Li, Peng, Liu, Yang

arXiv.org Artificial IntelligenceNov-4-2025

While Vision-language Models (VLMs) have demonstrated strong semantic capabilities, their ability to interpret the underlying geometric structure of visual information is less explored. Pictographic characters, which combine visual form with symbolic structure, provide an ideal test case for this capability. We formulate this visual recognition challenge in the mathematical domain, where each character is represented by an executable program of geometric primitives. This is framed as a program synthesis task, training a VLM to decompile raster images into programs composed of Bézier curves. Our model, acting as a "visual decompiler", demonstrates performance superior to strong zero-shot baselines, including GPT-4o. The most significant finding is that when trained solely on modern Chinese characters, the model is able to reconstruct ancient Oracle Bone Script in a zero-shot context. This generalization provides strong evidence that the model acquires an abstract and transferable geometric grammar, moving beyond pixel-level pattern recognition to a more structured form of visual understanding.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2511.00076

Country: North America > United States > New Mexico (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

AI based signage classification for linguistic landscape studies

Jiang, Yuqin, Jiang, Song, Algrim, Jacob, Harms, Trevor, Koenen, Maxwell, Lan, Xinya, Li, Xingyu, Lin, Chun-Han, Liu, Jia, Sun, Jiayang, Zenger, Henry

arXiv.org Artificial IntelligenceOct-28-2025

Linguistic Landscape (LL) research traditionally relies on manual photography and annotation of public signages to examine distribution of languages in urban space. While such methods yield valuable findings, the process is time-consuming and difficult for large study areas. This study explores the use of AI powered language detection method to automate LL analysis. Using Honolulu Chinatown as a case study, we constructed a georeferenced photo dataset of 1,449 images collected by researchers and applied AI for optical character recognition (OCR) and language classification. We also conducted manual validations for accuracy checking. This model achieved an overall accuracy of 79%. Five recurring types of mislabeling were identified, including distortion, reflection, degraded surface, graffiti, and hallucination. The analysis also reveals that the AI model treats all regions of an image equally, detecting peripheral or background texts that human interpreters typically ignore. Despite these limitations, the results demonstrate the potential of integrating AI-assisted workflows into LL research to reduce such time-consuming processes. However, due to all the limitations and mis-labels, we recognize that AI cannot be fully trusted during this process. This paper encourages a hybrid approach combining AI automation with human validation for a more reliable and efficient workflow.

ai model, artificial intelligence, optical character recognition, (16 more...)

arXiv.org Artificial Intelligence

2510.22885

Country: North America > United States > Hawaii > Honolulu County > Honolulu (0.29)

Genre: Research Report > New Finding (1.00)

Industry: Consumer Products & Services > Restaurants (0.70)

Technology:

Information Technology > Artificial Intelligence > Applied AI (1.00)
Information Technology > Communications > Mobile (0.68)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.54)

Add feedback

Geometric Structures and Patterns of Meaning: A PHATE Manifold Analysis of Chinese Character Embeddings

Gong, Wen G.

arXiv.org Artificial IntelligenceOct-3-2025

We systematically investigate geometric patterns in Chinese character embeddings using PHATE manifold analysis. Through cross-validation across seven embedding models and eight dimensionality reduction methods, we observe clustering patterns for content words ( 实词) and branching patterns for function words ( 虚词). Analysis of 1000+ characters across 12 semantic domains reveals that geometric complexity correlates with semantic content: meaningful characters exhibit rich geometric diversity while structural radicals collapse into tight clusters. The comprehensive 子-network analysis (123 phrases) demonstrates systematic semantic expansion from fundamental element character. These findings provide computational evidence supporting traditional linguistic theory and establish a novel framework for geometric analysis of semantic organization.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2510.0123

Genre: Research Report > Experimental Study (0.46)

Industry:

Health & Medicine (1.00)
Education (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.95)

Add feedback

Glyce: Glyph-vectors for Chinese Character Representations

Yuxian Meng, Wei Wu, Fei Wang, Xiaoya Li, Ping Nie, Fan Yin, Muyu Li, Qinghong Han, Xiaofei Sun, Jiwei Li

Neural Information Processing SystemsOct-2-2025, 15:33:12 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

Europe (0.46)
Oceania > Australia (0.14)
North America > Canada (0.14)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.95)

Add feedback

Speculating LLMs' Chinese Training Data Pollution from Their Tokens

Zhang, Qingjie, Wang, Di, Qian, Haoting, Yan, Liu, Zhang, Tianwei, Xu, Ke, Li, Qi, Huang, Minlie, Li, Hewu, Qiu, Han

arXiv.org Artificial IntelligenceOct-1-2025

Tokens are basic elements in the datasets for LLM training. It is well-known that many tokens representing Chinese phrases in the vocabulary of GPT (4o/4o-mini/o1/o3/4.5/4.1/o4-mini) are indicating contents like pornography or online gambling. Based on this observation, our goal is to locate Polluted Chinese (PoC) tokens in LLMs and study the relationship between PoC tokens' existence and training data. (1) We give a formal definition and taxonomy of PoC tokens based on the GPT's vocabulary. (2) We build a PoC token detector via fine-tuning an LLM to label PoC tokens in vocabularies by considering each token's both semantics and related contents from the search engines. (3) We study the speculation on the training data pollution via PoC tokens' appearances (token ID). Experiments on GPT and other 23 LLMs indicate that tokens widely exist while GPT's vocabulary behaves the worst: more than 23% long Chinese tokens (i.e., a token with more than two Chinese characters) are either porn or online gambling. We validate the accuracy of our speculation method on famous pre-training datasets like C4 and Pile. Then, considering GPT-4o, we speculate that the ratio of "Yui Hatano" related webpages in GPT-4o's training data is around 0.5%.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2508.17771

Country:

North America > United States (0.46)
Asia > China (0.28)

Genre: Research Report (0.40)

Industry:

Leisure & Entertainment > Games (0.94)
Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Weakly Supervised Fine-grained Span-Level Framework for Chinese Radiology Report Quality Assurance

Wang, Kaiyu, Mu, Lin, Yang, Zhiyao, Li, Ximing, Gao, Xiaotang Zhou Wanfu, Zhang, Huimao

arXiv.org Artificial IntelligenceSep-3-2025

Quality Assurance (QA) for radiology reports refers to judging whether the junior reports (written by junior doctors) are qualified. The QA scores of one junior report are given by the senior doctor(s) after reviewing the image and junior report. This process requires intensive labor costs for senior doctors. Additionally, the QA scores may be inaccurate for reasons like diagnosis bias, the ability of senior doctors, and so on. To address this issue, we propose a Span-level Quality Assurance EvaluaTOR (Sqator) to mark QA scores automatically. Unlike the common document-level semantic comparison method, we try to analyze the semantic difference by exploring more fine-grained text spans. Specifically, Sqator measures QA scores by measuring the importance of revised spans between junior and senior reports, and outputs the final QA scores by merging all revised span scores. We evaluate Sqator using a collection of 12,013 radiology reports. Experimental results show that Sqator can achieve competitive QA scores. Moreover, the importance scores of revised spans can be also consistent with the judgments of senior doctors.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2508.08876

Country: Asia > China (0.16)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine > Nuclear Medicine (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Advancements in Chinese font generation since deep learning era: A survey

Chen, Weiran, Zhu, Guiqian, Li, Ying, Ji, Yi, Liu, Chunping

arXiv.org Artificial IntelligenceAug-12-2025

Chinese font generation aims to create a new Chinese font library based on some reference samples. It is a topic of great concern to many font designers and typographers. Over the past years, with the rapid development of deep learning algorithms, various new techniques have achieved flourishing and thriving progress. Nevertheless, how to improve the overall quality of generated Chinese character images remains a tough issue. In this paper, we conduct a holistic survey of the recent Chinese font generation approaches based on deep learning. To be specific, we first illustrate the research background of the task. Then, we outline our literature selection and analysis methodology, and review a series of related fundamentals, including classical deep learning architectures, font representation formats, public datasets, and frequently-used evaluation metrics. After that, relying on the number of reference samples required to generate a new font, we categorize the existing methods into two major groups: many-shot font generation and few-shot font generation methods. Within each category, representative approaches are summarized, and their strengths and limitations are also discussed in detail. Finally, we conclude our paper with the challenges and future directions, with the expectation to provide some valuable illuminations for the researchers in this field.

artificial intelligence, machine learning, proceedings, (14 more...)

arXiv.org Artificial Intelligence

2508.069

Country: